431 Class 08

Thomas E. Love, Ph.D.

2023-09-21

Our Agenda

  • Ingesting the dm1000 data using read_excel()
  • Working with Categorical Data
    • Using factors and the forcats package (part of tidyverse)
    • Building tables and gt()
    • Building plots for categories with geom_bar() and geom_count()
  • Building a Model for a Scatterplot
    • Fitting a Simple Linear Regression Model
    • Power Transformations to improve Model Fit

Our R Packages

library(Epi) ## for twoby2() function
library(gt) ## for making tables
library(gtExtras) ## for fancier tables
library(readxl) ## for ingesting Excel files
library(broom)
library(janitor)
library(naniar) 
library(patchwork)
library(tidyverse) # always load tidyverse last

theme_set(theme_test()) # trying a new theme
knitr::opts_chunk$set(comment = NA)
  • As usual, #| message: false silences messages here.

Ingesting an Excel file

Today, we’ll use an Excel file (.xls, rather than .csv) to import the dm1000 data.

dm1000 <- read_excel("c08/data/dm_1000.xls") |>
  clean_names() |>
  mutate(across(where(is.character), as_factor)) |>
  mutate(subject = as.character(subject))
  • The readxl package is a non-core part of the tidyverse, and also includes functions called read_xls() and read_xlsx().
  • Visit https://readxl.tidyverse.org/.

The dm1000 tibble

dm1000
# A tibble: 1,000 × 17
   subject   age insurance  n_income    ht    wt   sbp   dbp   a1c   ldl tobacco
   <chr>   <dbl> <fct>         <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>  
 1 M-0001     55 Medicaid      29853  1.63 103.    145    70   6.4   221 Current
 2 M-0002     52 Commercial    31248  1.75 112.    151    77   8.5   116 Never  
 3 M-0003     69 Medicare      23362  1.65  74.9   127    73   8.9    52 Former 
 4 M-0004     57 Medicaid      26033  1.63  81.4   125    74   6.8   122 Never  
 5 M-0005     68 Medicare      85374  1.69  92.6   120    73  10.3    94 Never  
 6 M-0006     56 Medicaid      31273  1.71  54.6   127    75  12.3    NA Current
 7 M-0007     54 Commercial    25445  1.68  81.6   114    81   6.5   100 Current
 8 M-0008     45 Medicare      67526  1.69  80.6   166   110   5.4    NA Never  
 9 M-0009     61 Medicare      15203  1.91  86.7   111    77   7.1    71 Former 
10 M-0010     63 Medicaid      17628  1.86 123.    146   102   8.9    73 <NA>   
# ℹ 990 more rows
# ℹ 6 more variables: statin <dbl>, eye_exam <dbl>, race_ethnicity <fct>,
#   sex <fct>, county <fct>, residence <fct>

Dealing with Categories

A Binary Variable

dm1000 |> count(eye_exam)
# A tibble: 2 × 2
  eye_exam     n
     <dbl> <int>
1        0   438
2        1   562

Can we make this a factor?

dm1000 <- dm1000 |> 
  mutate(eye_ex1 = factor(eye_exam))

dm1000 |> count(eye_ex1, eye_exam)
# A tibble: 2 × 3
  eye_ex1 eye_exam     n
  <fct>      <dbl> <int>
1 0              0   438
2 1              1   562

Using fct_recode()

Can we recode the 0/1 values into a factor with new names?

dm1000 <- dm1000 |> 
  mutate(eye_ex = fct_recode(factor(eye_exam), "No" = "0", "Yes" = "1"))

dm1000 |> count(eye_ex, eye_exam)
# A tibble: 2 × 3
  eye_ex eye_exam     n
  <fct>     <dbl> <int>
1 No            0   438
2 Yes           1   562
dm1000 |> tabyl(eye_ex) |> adorn_pct_formatting() |> adorn_totals()
 eye_ex    n percent
     No  438   43.8%
    Yes  562   56.2%
  Total 1000       -

Counting multiple categories

dm1000 |> count(tobacco)
# A tibble: 4 × 2
  tobacco     n
  <fct>   <int>
1 Current   274
2 Never     343
3 Former    367
4 <NA>       16
dm1000 |> tabyl(tobacco) |> adorn_pct_formatting() |> adorn_totals()
 tobacco    n percent valid_percent
 Current  274   27.4%         27.8%
   Never  343   34.3%         34.9%
  Former  367   36.7%         37.3%
    <NA>   16    1.6%             -
   Total 1000       -             -
  • Does this ordering of tobacco levels make sense?

Changing Order of tobacco levels

dm1000 <- dm1000 |> 
  mutate(tobacco = fct_relevel(tobacco, "Current", "Former"))

dm1000 |> tabyl(tobacco) |> adorn_pct_formatting() |> adorn_totals()
 tobacco    n percent valid_percent
 Current  274   27.4%         27.8%
  Former  367   36.7%         37.3%
   Never  343   34.3%         34.9%
    <NA>   16    1.6%             -
   Total 1000       -             -
  • Does this order make more sense?
  • fct_relevel() and fct_recode() come from the forcats package.

Using the forcats package

  • fct_recode(): manually specify new levels for the factor
  • fct_relevel(): specify a new order manually for the levels of a factor
  • fct_reorder(): reordering a factor by another variable
  • fct_infreq(): reordering a factor by its frequency of values
  • fct_lump(): collapsing least frequent values into “other”
  • and several others

forcats references

  1. I use the forcats tools frequently in our Course Notes
  2. https://forcats.tidyverse.org/ forcats web page, especially the vignette
  3. Posit Cheat Sheet on Factors with forcats.
  4. R for Data Science on Factors

Using gt to make a table prettier

dm1000 |> tabyl(tobacco) |> adorn_pct_formatting() |> adorn_totals() |> 
  gt() |> tab_header(title = "Tobacco Status from dm1000")
Tobacco Status from dm1000
tobacco n percent valid_percent
Current 274 27.4% 27.8%
Former 367 36.7% 37.3%
Never 343 34.3% 34.9%
NA 16 1.6% -
Total 1000 - -

Table Themes in gtExtras

dm1000 |> tabyl(insurance) |> adorn_pct_formatting() |> adorn_totals() |> 
  gt() |> 
  gt_theme_nytimes() |> 
  tab_header(title = "Table styled like the New York Times")
Table styled like the New York Times
insurance n percent
Medicaid 330 33.0%
Commercial 196 19.6%
Medicare 432 43.2%
Uninsured 42 4.2%
Total 1000 -
  • There’s also a gt_theme_espn(), gt_theme_538 and several others within the gtExtras package.

geom_bar() to plot a factor

ggplot(dm1000, aes(x = tobacco)) +
  geom_bar()

Choosing colors for geom_bar

tempdat <- dm1000 |> filter(complete.cases(tobacco))

ggplot(data = tempdat, aes(x = tobacco, fill = tobacco)) +
  geom_bar() + 
  scale_fill_manual(
    values = c(Current = "tomato", Former = "orange", Never = "pink")) 

Adding Counts, Deleting Legend

tempdat <- dm1000 |> filter(complete.cases(tobacco))

ggplot(data = tempdat, aes(x = tobacco, fill = tobacco)) +
  geom_bar() + 
  geom_text(aes(y = after_stat(count), label = after_stat(count)), 
            stat = "count", vjust = 1.5, col = "black", size = 8) +
  scale_fill_manual(values = c("Current" = "tomato", 
                               Former = "orange", Never = "pink")) +
  guides(fill = "none")

Two-Way Tables

count for two variables

dm1000 |> 
  count(statin, tobacco)
# A tibble: 8 × 3
  statin tobacco     n
   <dbl> <fct>   <int>
1      0 Current    67
2      0 Former     80
3      0 Never      92
4      0 <NA>        3
5      1 Current   207
6      1 Former    287
7      1 Never     251
8      1 <NA>       13
dm1000 |> 
  count(insurance, residence)
# A tibble: 12 × 3
   insurance  residence     n
   <fct>      <fct>     <int>
 1 Medicaid   Suburbs     114
 2 Medicaid   Cleveland   201
 3 Medicaid   <NA>         15
 4 Commercial Suburbs      75
 5 Commercial Cleveland   119
 6 Commercial <NA>          2
 7 Medicare   Suburbs     163
 8 Medicare   Cleveland   259
 9 Medicare   <NA>         10
10 Uninsured  Suburbs      19
11 Uninsured  Cleveland    22
12 Uninsured  <NA>          1

Specify factor orders

dm1000 <- dm1000 |>
  mutate(residence = fct_relevel(residence, "Cleveland"),
         insurance = fct_relevel(insurance, 
                      "Medicare", "Commercial", "Medicaid"))

dm1000 |> 
  filter(complete.cases(insurance, residence)) |>
  tabyl(residence, insurance) |>
  adorn_totals(where = c("row", "col")) |> 
  gt() |>
  gt_theme_guardian()
residence Medicare Commercial Medicaid Uninsured Total
Cleveland 259 119 201 22 601
Suburbs 163 75 114 19 371
Total 422 194 315 41 972

Were suburban residents more likely to have a statin prescription?

dm1000 |> 
  filter(complete.cases(statin, residence)) |>
  tabyl(residence, statin) |>
  adorn_totals(where = c("row", "col")) |> 
  gt() |>
  gt_theme_espn()
residence 0 1 Total
Cleveland 157 444 601
Suburbs 77 294 371
Total 234 738 972
  • Can we identify a useful answer here?
  • Would it help to change the order of these variables?

Revise variable orderings

dm1000 |> filter(complete.cases(statin, residence)) |>
  mutate(statin = 
           fct_recode(factor(statin), "Statin" = "1", "No Statin" = "0"),
         statin = fct_relevel(statin, "Statin"),
         residence = fct_relevel(residence, "Suburbs")) |>
  tabyl(residence, statin) |> 
  adorn_totals(where = c("row", "col")) |> 
  gt() |> 
  gt_theme_dark()
residence Statin No Statin Total
Suburbs 294 77 371
Cleveland 444 157 601
Total 738 234 972

Table of Percentages by Residence

dm1000 |> filter(complete.cases(statin, residence)) |>
  mutate(statin = 
           fct_recode(factor(statin), "Statin" = "1", "No Statin" = "0"),
         statin = fct_relevel(statin, "Statin"),
         residence = fct_relevel(residence, "Suburbs")) |>
  tabyl(residence, statin) |> 
  adorn_percentages(denom = "row") |> adorn_pct_formatting() |>
  gt()
residence Statin No Statin
Suburbs 79.2% 20.8%
Cleveland 73.9% 26.1%

Create using table instead

tempdat1 <- dm1000 |> 
  mutate(statin = 
           fct_recode(factor(statin), "Statin" = "1", "No Statin" = "0"),
         statin = fct_relevel(statin, "Statin"),
         residence = fct_relevel(residence, "Suburbs"))

tab1 <- table(tempdat1$residence, tempdat1$statin)

tab1
           
            Statin No Statin
  Suburbs      294        77
  Cleveland    444       157

Analyzing a 2x2 table

twoby2(tab1)  # twoby2() is part of the Epi package
2 by 2 table analysis: 
------------------------------------------------------ 
Outcome   : Statin 
Comparing : Suburbs vs. Cleveland 

          Statin No Statin    P(Statin) 95% conf. interval
Suburbs      294        77       0.7925    0.7482   0.8307
Cleveland    444       157       0.7388    0.7022   0.7723

                                   95% conf. interval
             Relative Risk: 1.0727    0.9996   1.1510
         Sample Odds Ratio: 1.3501    0.9903   1.8407
Conditional MLE Odds Ratio: 1.3497    0.9805   1.8679
    Probability difference: 0.0537   -0.0018   0.1065

             Exact P-value: 0.0638 
        Asymptotic P-value: 0.0577 
------------------------------------------------------

A 3x4 two-way table

dm1000 |> filter(complete.cases(tobacco, insurance)) |>
  tabyl(tobacco, insurance) |> gt()
tobacco Medicare Commercial Medicaid Uninsured
Current 99 44 118 13
Former 183 70 103 11
Never 140 80 105 18
  • 3 rows, 4 columns: hence, this is a 3 x 4 table
  • It’s a two-way table, because we are studying the association of two variables (tobacco and insurance)
  • Compare insurance percentages by tobacco group?

Insurance rates by tobacco group

dm1000 |> filter(complete.cases(tobacco, insurance)) |>
  tabyl(tobacco, insurance) |> 
  adorn_percentages(denominator = "row") |>
  adorn_pct_formatting() |> 
  gt()
tobacco Medicare Commercial Medicaid Uninsured
Current 36.1% 16.1% 43.1% 4.7%
Former 49.9% 19.1% 28.1% 3.0%
Never 40.8% 23.3% 30.6% 5.2%
  • If you leave out adorn_pct_formatting(), you’ll get proportions, which fall between 0 and 1: multiply by 100 for percentages.

Chi-Square Test of Association?

tempdat2 <- dm1000 |> 
  filter(complete.cases(tobacco, insurance)) 

tab2 <- table(tempdat2$tobacco, tempdat2$insurance)

tab2
         
          Medicare Commercial Medicaid Uninsured
  Current       99         44      118        13
  Former       183         70      103        11
  Never        140         80      105        18
chisq.test(tab2)

    Pearson's Chi-squared test

data:  tab2
X-squared = 25.592, df = 6, p-value = 0.0002651

Three-Way Tables

Using count for three variables

dm1000 |> count(sex, statin, residence)
# A tibble: 12 × 4
   sex    statin residence     n
   <fct>   <dbl> <fct>     <int>
 1 Female      0 Cleveland    87
 2 Female      0 Suburbs      42
 3 Female      0 <NA>          4
 4 Female      1 Cleveland   245
 5 Female      1 Suburbs     160
 6 Female      1 <NA>         12
 7 Male        0 Cleveland    70
 8 Male        0 Suburbs      35
 9 Male        0 <NA>          4
10 Male        1 Cleveland   199
11 Male        1 Suburbs     134
12 Male        1 <NA>          8

A three-way table via tabyl

dm1000 |> 
  filter(complete.cases(statin, residence, sex)) |>
  tabyl(statin, residence, sex) |> 
  adorn_totals(where = c("row", "col")) |>
  adorn_title() 
$Female
        residence              
 statin Cleveland Suburbs Total
      0        87      42   129
      1       245     160   405
  Total       332     202   534

$Male
        residence              
 statin Cleveland Suburbs Total
      0        70      35   105
      1       199     134   333
  Total       269     169   438

Flattening a three-way table

ftable(dm1000$sex, dm1000$residence, dm1000$statin)
                    0   1
                         
Female Cleveland   87 245
       Suburbs     42 160
Male   Cleveland   70 199
       Suburbs     35 134
  • Note that ftable() excludes the missing residence values by default.

Plotting a 3-Way Table (Counts)

ggplot(data = filter(dm1000, complete.cases(residence)),
       aes(x = residence, y = factor(statin))) +
   geom_count() +
   facet_wrap(~ sex, labeller = "label_both")

Plotting a 3-Way Table (Jitter)

ggplot(data = filter(dm1000, complete.cases(residence)),
       aes(x = residence, y = sex)) +
   geom_jitter() +
   facet_wrap(~ statin, labeller = "label_both")

Multi-categorical 3-Way Table

dm1000 |> 
  filter(complete.cases(insurance, race_ethnicity, tobacco)) |>
  tabyl(race_ethnicity, insurance, tobacco) |> 
  adorn_totals(where = c("row", "col")) |>
  adorn_title()
$Current
                    insurance                                    
     race_ethnicity  Medicare Commercial Medicaid Uninsured Total
 Non-Hispanic Black        58         26       80         3   167
 Hispanic or Latinx         6          0        7         1    14
 Non-Hispanic White        35         17       31         9    92
 Non-Hispanic Asian         0          1        0         0     1
              Total        99         44      118        13   274

$Former
                    insurance                                    
     race_ethnicity  Medicare Commercial Medicaid Uninsured Total
 Non-Hispanic Black        96         34       66         5   201
 Hispanic or Latinx        15          7        9         2    33
 Non-Hispanic White        72         28       27         4   131
 Non-Hispanic Asian         0          1        1         0     2
              Total       183         70      103        11   367

$Never
                    insurance                                    
     race_ethnicity  Medicare Commercial Medicaid Uninsured Total
 Non-Hispanic Black        66         30       55         6   157
 Hispanic or Latinx        18          7       16         2    43
 Non-Hispanic White        51         39       29         7   126
 Non-Hispanic Asian         5          4        5         3    17
              Total       140         80      105        18   343

Multi-categorical 3-Way Counts

ggplot(data = filter(dm1000, complete.cases(tobacco)),
       aes(x = insurance, y = race_ethnicity)) +
   geom_count() +
   facet_wrap(~ tobacco, labeller = "label_both")

Multi-categorical 3-Way Jitter Plot

ggplot(data = filter(dm1000, complete.cases(tobacco)),
       aes(x = insurance, y = race_ethnicity)) +
   geom_jitter() +
   facet_wrap(~ tobacco, labeller = "label_both")

Building a Model for a Scatterplot

Scatterplot: Systolic vs. Diastolic BP

Consider SBP vs. DBP in a scatterplot, adding a linear model.

  • What did we do here with str_glue()?
dm1000_ccbp <- dm1000 |> filter(complete.cases(sbp, dbp))

ggplot(data = dm1000_ccbp, aes(x = dbp, y = sbp)) +
  geom_point(col = "gray50") +
  geom_smooth(method = "lm", se = FALSE, col = "red", formula = y ~ x) +
  theme(aspect.ratio = 1) +
  labs(x = "Diastolic BP (mm Hg)", y = "Systolic BP (mm Hg)",
       title = "SBP and DBP are positively associated",
       subtitle = str_glue(nrow(dm1000_ccbp), " Adults with Diabetes"),
       caption = str_glue("Pearson correlation (SBP, DBP) is ",
                    round_half_up(cor(dm1000_ccbp$sbp, dm1000_ccbp$dbp),3)))

Scatterplot: Systolic vs. Diastolic BP

Fitting the Linear Regression Model

m1 <- lm(sbp ~ dbp, data = dm1000_ccbp)

summary(m1)

Call:
lm(formula = sbp ~ dbp, data = dm1000_ccbp)

Residuals:
    Min      1Q  Median      3Q     Max 
-42.620 -11.392  -1.160   9.338  71.301 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 84.11467    3.09011   27.22   <2e-16 ***
dbp          0.65347    0.04093   15.96   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 16.02 on 992 degrees of freedom
Multiple R-squared:  0.2044,    Adjusted R-squared:  0.2036 
F-statistic: 254.9 on 1 and 992 DF,  p-value: < 2.2e-16

Viewing the Linear Regression Model

  • tidy() comes from the broom package.
m1 <- lm(sbp ~ dbp, data = dm1000_ccbp)

tidy(m1)
# A tibble: 2 × 5
  term        estimate std.error statistic   p.value
  <chr>          <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)   84.1      3.09        27.2 2.62e-122
2 dbp            0.653    0.0409      16.0 3.10e- 51
tidy(m1, conf.int = TRUE, conf.level = 0.90) |> 
  gt() |> fmt_number(decimals = 3) |> gt_theme_espn()
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 84.115 3.090 27.221 0.000 79.027 89.202
dbp 0.653 0.041 15.964 0.000 0.586 0.721

Assessing the Linear Model

  • glance() also comes from the broom package.
m1 <- lm(sbp ~ dbp, data = dm1000_ccbp)

glance(m1)
# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
1     0.204         0.204  16.0      255. 3.10e-51     1 -4167. 8339. 8354.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
glance(m1) |> 
  round_half_up(digits = c(4, 4, 2, 2, 3, 0, 0, 1, 1, 0, 0, 0)) |> 
  gt() |> gt_theme_dark()
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
0.2044 0.2036 16.02 254.86 0 1 -4167 8339.3 8354 254610 992 994

Two m1 Residual Plots

SBP (transformed) vs. DBP

pow_1 <- ggplot(data = dm1000_ccbp, aes(x = dbp, y = 1/sbp)) +
  geom_point(col = "gray50") +
  geom_smooth(method = "loess", se = FALSE, col = "blue", formula = y ~ x) +
  geom_smooth(method = "lm", se = FALSE, col = "red", formula = y ~ x) +
  labs(title = "1/SBP vs. DBP") + theme(aspect.ratio = 1)

pow_2 <- ggplot(data = dm1000_ccbp, aes(x = dbp, y = log(sbp))) +
  geom_point(col = "gray50") +
  geom_smooth(method = "loess", se = FALSE, col = "blue", formula = y ~ x) +
  geom_smooth(method = "lm", se = FALSE, col = "red", formula = y ~ x) +
  labs(title = "log(SBP) vs. DBP") + theme(aspect.ratio = 1)

pow_3 <- ggplot(data = dm1000_ccbp, aes(x = dbp, y = sqrt(sbp))) +
  geom_point(col = "gray50") +
  geom_smooth(method = "loess", se = FALSE, col = "blue", formula = y ~ x) +
  geom_smooth(method = "lm", se = FALSE, col = "red", formula = y ~ x) +
  labs(title = "sqrt(SBP) vs. DBP") + theme(aspect.ratio = 1)

pow_4 <- ggplot(data = dm1000_ccbp, aes(x = dbp, y = sbp)) +
  geom_point(col = "gray50") +
  geom_smooth(method = "loess", se = FALSE, col = "blue", formula = y ~ x) +
  geom_smooth(method = "lm", se = FALSE, col = "red", formula = y ~ x) +
  labs(title = "SBP vs. DBP") + theme(aspect.ratio = 1)

pow_5 <- ggplot(data = dm1000_ccbp, aes(x = dbp, y = sbp^2)) +
  geom_point(col = "gray50") +
  geom_smooth(method = "loess", se = FALSE, col = "blue", formula = y ~ x) +
  geom_smooth(method = "lm", se = FALSE, col = "red", formula = y ~ x) +
  labs(title = "SBP^2 vs. DBP") + theme(aspect.ratio = 1)

pow_6 <- ggplot(data = dm1000_ccbp, aes(x = dbp, y = sbp^3)) +
  geom_point(col = "gray50") +
  geom_smooth(method = "loess", se = FALSE, col = "blue", formula = y ~ x) +
  geom_smooth(method = "lm", se = FALSE, col = "red", formula = y ~ x) +
  labs(title = "SBP^3 vs. DBP") + theme(aspect.ratio = 1)

(pow_1 + pow_2 + pow_3) / (pow_4 + pow_5 + pow_6)

SBP (transformed) vs. DBP

Ladder of Power Transformations

If \(y\) is strictly positive, we can apply transformations from this ladder to try to “linearize” the relationship between our outcome \(y\) and our predictor \(x\).

-1 0 1/2 1 2 3
\(\frac{1}{y}\) \(log(y)\) \(\sqrt{y}\) \(y\) \(y^2\) \(y^3\)
Inverse Logarithm Square Root None Square Cube

Fitting 1/SBP vs. DBP

m2 <- lm(1/sbp ~ dbp, data = dm1000_ccbp)

par(mfrow = c(1,2)); plot(m2, which = 1:2); par(mfrow = c(1,1))

Scatterplot: Weight vs. Height

dm1000_htwt <- dm1000 |> filter(complete.cases(ht, wt))

ggplot(data = dm1000_htwt, aes(x = wt, y = ht)) +
  geom_point(col = "gray50") +
  geom_smooth(method = "lm", se = FALSE, col = "red", formula = y ~ x) +
  theme(aspect.ratio = 1) +
  labs(x = "Weight (in kg)", y = "Height (in m)",
       title = "Weight and Height are positively associated",
       subtitle = str_glue(nrow(dm1000_htwt), " Adults with Diabetes"),
       caption = str_glue("Pearson correlation (Wt, Ht) is ",
                    round_half_up(cor(dm1000_htwt$wt, dm1000_htwt$ht),3)))

Scatterplot: Weight vs. Height

Weight predicted with Height?

m3 <- lm(wt ~ ht, data = dm1000_htwt)

tidy(m3, conf.int = TRUE, conf.level = 0.90) |> 
  gt() |> fmt_number(decimals = 3) |> gt_theme_dark()
term estimate std.error statistic p.value conf.low conf.high
(Intercept) −33.665 12.623 −2.667 0.008 −54.448 −12.882
ht 76.479 7.475 10.231 0.000 64.172 88.787
glance(m3) |> 
  round_half_up(digits = c(4, 4, 2, 2, 3, 0, 0, 1, 1, 0, 0, 0)) |> 
  gt() |> gt_theme_dark()
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
0.0966 0.0957 24.55 104.67 0 1 -4531 9067.7 9082.4 590031 979 981

Weight vs. Height Residual Plots

m3 <- lm(wt ~ ht, data = dm1000_ccbp)

par(mfrow = c(1,2)); plot(m3, which = 1:2); par(mfrow = c(1,1))

Weight (transformed) vs. Height

Session Information

sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 (2023-06-16 ucrt)
 os       Windows 11 x64 (build 22621)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_United States.utf8
 ctype    English_United States.utf8
 tz       America/New_York
 date     2023-09-20
 pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version    date (UTC) lib source
 backports     1.4.1      2021-12-13 [1] CRAN (R 4.3.0)
 broom       * 1.0.5      2023-06-09 [1] CRAN (R 4.3.1)
 cellranger    1.1.0      2016-07-27 [1] CRAN (R 4.3.1)
 cli           3.6.1      2023-03-23 [1] CRAN (R 4.3.1)
 cmprsk        2.2-11     2022-01-06 [1] CRAN (R 4.3.1)
 colorspace    2.1-0      2023-01-23 [1] CRAN (R 4.3.1)
 data.table    1.14.8     2023-02-17 [1] CRAN (R 4.3.1)
 digest        0.6.33     2023-07-07 [1] CRAN (R 4.3.1)
 dplyr       * 1.1.2      2023-04-20 [1] CRAN (R 4.3.1)
 ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.3.1)
 Epi         * 2.47.1     2023-04-25 [1] CRAN (R 4.3.1)
 etm           1.1.1      2020-09-08 [1] CRAN (R 4.3.1)
 evaluate      0.21       2023-05-05 [1] CRAN (R 4.3.1)
 fansi         1.0.4      2023-01-22 [1] CRAN (R 4.3.1)
 farver        2.1.1      2022-07-06 [1] CRAN (R 4.3.1)
 fastmap       1.1.1      2023-02-24 [1] CRAN (R 4.3.1)
 fontawesome   0.5.2      2023-08-19 [1] CRAN (R 4.3.1)
 forcats     * 1.0.0      2023-01-29 [1] CRAN (R 4.3.1)
 generics      0.1.3      2022-07-05 [1] CRAN (R 4.3.1)
 ggplot2     * 3.4.3      2023-08-14 [1] CRAN (R 4.3.1)
 glue          1.6.2      2022-02-24 [1] CRAN (R 4.3.1)
 gt          * 0.9.0      2023-03-31 [1] CRAN (R 4.3.1)
 gtable        0.3.4      2023-08-21 [1] CRAN (R 4.3.1)
 gtExtras    * 0.4.5      2022-11-28 [1] CRAN (R 4.3.1)
 hms           1.1.3      2023-03-21 [1] CRAN (R 4.3.1)
 htmltools     0.5.6      2023-08-10 [1] CRAN (R 4.3.1)
 janitor     * 2.2.0      2023-02-02 [1] CRAN (R 4.3.1)
 jsonlite      1.8.7      2023-06-29 [1] CRAN (R 4.3.1)
 knitr         1.43       2023-05-25 [1] CRAN (R 4.3.1)
 labeling      0.4.3      2023-08-29 [1] CRAN (R 4.3.1)
 lattice       0.21-8     2023-04-05 [2] CRAN (R 4.3.1)
 lifecycle     1.0.3      2022-10-07 [1] CRAN (R 4.3.1)
 lubridate   * 1.9.2      2023-02-10 [1] CRAN (R 4.3.1)
 magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.3.1)
 MASS          7.3-60     2023-05-04 [2] CRAN (R 4.3.1)
 Matrix        1.6-1      2023-08-14 [1] CRAN (R 4.3.1)
 mgcv          1.8-42     2023-03-02 [2] CRAN (R 4.3.1)
 munsell       0.5.0      2018-06-12 [1] CRAN (R 4.3.1)
 naniar      * 1.0.0      2023-02-02 [1] CRAN (R 4.3.1)
 nlme          3.1-162    2023-01-31 [2] CRAN (R 4.3.1)
 numDeriv      2016.8-1.1 2019-06-06 [1] CRAN (R 4.3.0)
 paletteer     1.5.0      2022-10-19 [1] CRAN (R 4.3.1)
 patchwork   * 1.1.3      2023-08-14 [1] CRAN (R 4.3.1)
 pillar        1.9.0      2023-03-22 [1] CRAN (R 4.3.1)
 pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.3.1)
 plyr          1.8.8      2022-11-11 [1] CRAN (R 4.3.1)
 purrr       * 1.0.2      2023-08-10 [1] CRAN (R 4.3.1)
 R6            2.5.1      2021-08-19 [1] CRAN (R 4.3.1)
 Rcpp          1.0.11     2023-07-06 [1] CRAN (R 4.3.1)
 readr       * 2.1.4      2023-02-10 [1] CRAN (R 4.3.1)
 readxl      * 1.4.3      2023-07-06 [1] CRAN (R 4.3.1)
 rematch2      2.1.2      2020-05-01 [1] CRAN (R 4.3.1)
 rlang         1.1.1      2023-04-28 [1] CRAN (R 4.3.1)
 rmarkdown     2.24       2023-08-14 [1] CRAN (R 4.3.1)
 rstudioapi    0.15.0     2023-07-07 [1] CRAN (R 4.3.1)
 sass          0.4.7      2023-07-15 [1] CRAN (R 4.3.1)
 scales        1.2.1      2022-08-20 [1] CRAN (R 4.3.1)
 sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.3.1)
 snakecase     0.11.1     2023-08-27 [1] CRAN (R 4.3.1)
 stringi       1.7.12     2023-01-11 [1] CRAN (R 4.3.0)
 stringr     * 1.5.0      2022-12-02 [1] CRAN (R 4.3.1)
 survival      3.5-5      2023-03-12 [2] CRAN (R 4.3.1)
 tibble      * 3.2.1      2023-03-20 [1] CRAN (R 4.3.1)
 tidyr       * 1.3.0      2023-01-24 [1] CRAN (R 4.3.1)
 tidyselect    1.2.0      2022-10-10 [1] CRAN (R 4.3.1)
 tidyverse   * 2.0.0      2023-02-22 [1] CRAN (R 4.3.1)
 timechange    0.2.0      2023-01-11 [1] CRAN (R 4.3.1)
 tzdb          0.4.0      2023-05-12 [1] CRAN (R 4.3.1)
 utf8          1.2.3      2023-01-31 [1] CRAN (R 4.3.1)
 vctrs         0.6.3      2023-06-14 [1] CRAN (R 4.3.1)
 visdat        0.6.0      2023-02-02 [1] CRAN (R 4.3.1)
 withr         2.5.0      2022-03-03 [1] CRAN (R 4.3.1)
 xfun          0.40       2023-08-09 [1] CRAN (R 4.3.1)
 xml2          1.3.5      2023-07-06 [1] CRAN (R 4.3.1)
 yaml          2.3.7      2023-01-23 [1] CRAN (R 4.3.0)
 zoo           1.8-12     2023-04-13 [1] CRAN (R 4.3.1)

 [1] C:/Users/thoma/AppData/Local/R/win-library/4.3
 [2] C:/Program Files/R/R-4.3.1/library

──────────────────────────────────────────────────────────────────────────────